In order to play with the data, you can simply download the zip file on the github repository (here).
!!! Doing so, all the changes that you will make won’t affect the Github reposity !!!
To participate officially to InSciout, you need to be able to push your changes to the remote server (i.e., the Github repository). This process is explained in a later section, for now we will overview how the project is structured and how the different files and folder are meant to be used.
There are different folders in the project and this is how they are meant to be used:
database: this is where the insciout database InSciOut.sqlite3 stays, it also contains the following folders:
InSciOut.sqlite3, they will be named following the pattern: View001.sql, View002.sql, etc. Note that views are virtual tables that you can create from the information available from the database. Learn more about Views here.img.rawdata: this is where the excel spreadsheets that are used to create the database are stored. These excel spreadsheets were manually filled by our team, then a python program extracted their data to create a database organised with entries and columns. If you want to had an excel spreadsheet, it needs to be under a folder. Each folder represents a project/category. The parent folder’s name of an excel spreadsheet will be included in the reference number of the extracted entries when exported to the database. Learn more about the Database organisation here.
You can find tutorials that reviews the database organisation and how to create your own View/CSV Tables:
Note that if you want to make an official contribution, you will need to use the SQL method (which is also the simplest). Note that the SQL template to extract data from the database (using Views) is very accessible and easy to learn.
Rules and Conventions:
- save a backup version of the SQL code used to created your View in
./database/ViewQueries- export the CSV table version of your Views in
./database/CSVTables- name your views something like
view001rather thanJAPR_language_73- when your add a view to the database, please add its documentation in ./documentation/Views_Reference.Rmd.
- use short variable name such as
T1instead oftmpT1when making a temporary table withWITH- keep the main comments that explain
WITH,SELECT,FROM(you don’t know which sql files the new user will be openning first)- organise the header comment in a pretty way, don’t overload the comments, you can explain your code further in the
Views Reference.
A Github repository is similar to a Google Drive or Dropbox folder – it reflects a folder on your local machine. However, Github does not synchronise your files automatically, you need to explicitly upload your local folder (i.e. tu push) on the remote repository (i.e. the Github repository). Then each time you make changes to your local folder, you will need to push explicity the change made in the local folder to the remote repository.
This may seems cumbersome but Github is made to collaborate on writing text files (e.g. source code). With Google Drive, you could modify a file that one of your colleagues modified too, without noticing. Google drive may chose to overwrite the change of your colleague or to make two files. Github solves the problem.
These actions need to be perform only once to get the InSciOut repository on you PC.
You need to:
In Source Tree, you need to add your GitHub account. This can be done during the installation, as describe here or after you run SourceTree, as described here.
Github is named after Git, which is a version control software. Basically, it allows you to make a snapshot of your local folder at any time you want (i.e. to commit a change). By commiting your changes each time you did a meaningful progress, you can track your changes, come back to an earlier version, take a file from an earlier snapshot, etc… Such a folder that is tracked by Git is named a Git repository; each snapshot is a “version” of the repository.
Github is made to work with a Git repository. When we first create the InSciOut folder on our machine, we initialised Git in our local folder to make the InSciOut folder a ‘Git repository’. Then, we upload the Git repository (containing the list of all our snapshots made with Git) to Github. From then, each time we commit a change (i.e. snapshot), we can push it to the remote server (i.e. upload it to Github).
Initialization:
First commit:
First Push:
Second commit:
Second Push:
Because Github works with Git, it can manage conflicts when a file is modified by several persons. It happends that several persons start to make changes from the same version of the repository. While you commit you changes, Git will read the different files in the repository and detect the changes made to the previous version. If nobody changed the same part/lines of the previous version, Git will merge the change automatically. If two persons changed the same part/lines of the previous version, then Github will raise a merge conflict. In this case, the users will need to agree on what to keep, and re-commit the changes to Git.
While collaborating, you will be using this workflow (Centralized Workflow): https://www.atlassian.com/git/tutorials/comparing-workflows#centralized-workflow
Basically you need to check the box [rebase] in preference of SourceTree. Then each times you want to share your changes: - commit (if not already) - pull - [solve conflict if any] - push
Note that you can commit locally any many times as you want before to share your changes.